Hierarchical Reinforcement Learning: A Hybrid Approach

نویسندگان

  • Malcolm Ross Kinsella Ryan
  • Claude Sammut
چکیده

In this thesis we investigate the relationships between the symbolic and subsymbolic methods used for controlling agents by artificial intelligence, focusing in particular on methods that learn. In light of the strengths and weaknesses of each approach, we propose a hybridisation of symbolic and subsymbolic methods to capitalise on the best features of each. We implement such a hybrid system, called Rachel which incorporates techniques from Teleo-Reactive Planning, Hierarchical Reinforcement Learning and Inductive Logic Programming. Rachel uses a novel representation of behaviours, Reinforcement-Learnt Teleo-operators (RL-Tops), which defines the behaviour in terms of its desired consequences but leaves the implementation of the policy to be learnt by reinforcement learning. An RL-Top is an abstract, symbolic description of the purpose of a behaviour, and is used by Rachel both as a planning operator and as the definition of a reward function by which the behaviour can be learnt. Two new hierarchical reinforcement learning algorithms are introduced, Planned Hierarchical Semi-Markov Q-Learning (P-HSMQ) and Teleo-Reactive Q-Learning (TRQ). The former is an extension of the Hierarchical Semi-Markov Q-Learning algorithm to use computer generated plans in place of task-hierarchies (which are commonly provided by the trainer). The latter is a further elaboration of the algorithm to include more intelligent behaviour termination. The knowledge contained in the plan is used to determine when an executing behaviour is no longer appropriate, and can be prematurely terminated, resulting in more efficient policies. Incomplete descriptions of the effects of behaviours can lead the planner to make false assumptions in building plans. As behaviours are learnt, not implemented, not every effect of actions can be known in advance. Rachel implements a “reflector” which monitors for such unexpected and unwanted sideeffects. Using ILP it learns to predict when they will occur, and so repair its plans to avoid them. Together, the components of Rachel form a learning system which is able to receive abstract descriptions of behaviours, build plans to discover which of them may be useful to achieve its goals, learn concrete policies and optimal choices of behaviour through trial and error, discover and predict any unwanted side-effects that result and repair its plans to avoid them. It is a demonstration

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Functional Concepts for Knowledge Transfer among Reinforcement Learning Agents

This article introduces the notions of functional space and concept as a way of knowledge representation and abstraction for Reinforcement Learning agents. These definitions are used as a tool of knowledge transfer among agents. The agents are assumed to be heterogeneous; they have different state spaces but share a same dynamic, reward and action space. In other words, the agents are assumed t...

متن کامل

Multiscale Anticipatory Behavior by Hierarchical Reinforcement Learning

In order to establish autonomous behavior for technical systems, the well known trade-off between reactive control and deliberative planning has to be considered. Within this paper, we combine both principles by proposing a two-level hierarchical reinforcement learning scheme to enable the system to autonomously determine suitable solutions to new tasks. The approach is based on a behavior repr...

متن کامل

Using Abstract Models of Behaviours to Automatically Generate Reinforcement Learning Hierarchies

In this paper we present a hybrid system combining techniques from symbolic planning and reinforcement learning. Planning is used to automatically construct task hierarchies for hierarchical reinforcement learning based on abstract models of the behaviours’ purpose, and to perform intelligent termination improvement when an executing behaviour is no longer appropriate. Reinforcement learning is...

متن کامل

Hybrid Cooperative Agents with Online Reinforcement Learning for Traffic Control

This paper presents the application of fuzzy-neuroevolutionary hybrid system with online reinforcement learning for intelligent road traffic management and control. Taking a step away from the conventional traffic control system, the hybrid system presents different methodologies in knowledge acquisition, decisionmaking, learning and goal formulation with the use of a three-layered hierarchical...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002